28 research outputs found

    LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task

    Get PDF
    This paper presents the LIG-CRIStAL submission to the shared Automatic Post- Editing task of WMT 2017. We propose two neural post-editing models: a monosource model with a task-specific attention mechanism, which performs particularly well in a low-resource scenario; and a chained architecture which makes use of the source sentence to provide extra context. This latter architecture manages to slightly improve our results when more training data is available. We present and discuss our results on two datasets (en-de and de-en) that are made available for the task.Comment: keywords: neural post-edition, attention model

    Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model

    Full text link
    The recently released NLLB-200 is a set of multilingual Neural Machine Translation models that cover 202 languages. The largest model is based on a Mixture of Experts architecture and achieves SoTA results across many language pairs. It contains 54.5B parameters and requires at least four 32GB GPUs just for inference. In this work, we propose a pruning method that enables the removal of up to 80% of experts without further finetuning and with a negligible loss in translation quality, which makes it feasible to run the model on a single 32GB GPU. Further analysis suggests that our pruning metrics can identify language-specific experts

    Transferts de champs entre maillages de type éléments finis et applications numériques en mécanique non linéaire des structures

    Get PDF
    In continuum mechanics, when a problem is solved with the finite element method, field are known on nodes or on integration points, on a given mesh of the structure. If we which to use these results to perform a calculation on a second mesh, a data transfer is inevitable, especially in studies which imply adapting mesh process, or for coupling several codes. Numerical simulation must take this fact into account, which is not entirely the case today. So R&D division of EDF is eager to use some tools to remove this lock, in the software Code_Aster.There is a sum up of the work dine during the thesis. The objectives are the following: propose some methods for fields transfers, compare and describe these different approaches with theoretical analysis and numerical errors, implement one of these methods in Code_Aster, validate this implementation on some industrial cases.En mécanique des milieux continus, la résolution d'un problème à l'aide de la méthode des éléments finis permet d'obtenir des champs discrétisés aux noeuds ou aux points de Gauss, sur un maillage donné de la structure étudiée. Si l'on souhaite utiliser ces résultats afin d'effectuer un calcul sur un second maillage, un transfert de données est inévitable, notamment dans les études chaînées, lors de processus d'adaptation de maillage ou encore pour des couplage entre codes. La simulation numérique doit tenir compte de cet état de fait, ce qui n'est pas totalement le cas aujourd'hui; la division R&D d'EDF souhaite donc disposer d'outils permettant de lever ce verrou au sein du logiciel libre Code_Aster.Le manuscrit présente une synthèse des travaux menés durant la thèse, qui répondent aux objectifs suivants: proposer des méthodes de transfert de champs, comparer et qualifier ces différentes approches à l'aide d'ananlyses d'erreur théoriques et numériques, implanter l'une de ces méthodes dans Code_Aster, valider cette programmation sur quelques cas industriels

    NAVER LABS Europe's Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource Track

    Full text link
    This paper presents NAVER LABS Europe's systems for Tamasheq-French and Quechua-Spanish speech translation in the IWSLT 2023 Low-Resource track. Our work attempts to maximize translation quality in low-resource settings using multilingual parameter-efficient solutions that leverage strong pre-trained models. Our primary submission for Tamasheq outperforms the previous state of the art by 7.5 BLEU points on the IWSLT 2022 test set, and achieves 23.6 BLEU on this year's test set, outperforming the second best participant by 7.7 points. For Quechua, we also rank first and achieve 17.7 BLEU, despite having only two hours of translation data. Finally, we show that our proposed multilingual architecture is also competitive for high-resource languages, outperforming the best unconstrained submission to the IWSLT 2021 Multilingual track, despite using much less training data and compute.Comment: IWSLT 2023: Tamasheq-French and Quechua-Spanish challenge winne

    SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages

    Full text link
    In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar languages, thus enabling zero-shot translation. To overcome the "curse of multilinguality", these models often opt for scaling up the number of parameters, which makes their use in resource-constrained environments challenging. We introduce SMaLL-100, a distilled version of the M2M-100 (12B) model, a massively multilingual machine translation model covering 100 languages. We train SMaLL-100 with uniform sampling across all language pairs and therefore focus on preserving the performance of low-resource languages. We evaluate SMaLL-100 on different low-resource benchmarks: FLORES-101, Tatoeba, and TICO-19 and demonstrate that it outperforms previous massively multilingual models of comparable sizes (200-600M) while improving inference latency and memory usage. Additionally, our model achieves comparable results to M2M-100 (1.2B), while being 3.6x smaller and 4.3x faster at inference. Code and pre-trained models: https://github.com/alirezamshi/small100Comment: Accepted to EMNLP 202

    Field transfers between finite element meshes and numerical applications in non linear mechanics

    No full text
    En mécanique des milieux continus, la résolution d'un problème à l'aide de la méthode des éléments finis permet d'obtenir des champs discrétisés aux noeuds ou aux points de Gauss, sur un maillage donné de la structure étudiée. Si l'on souhaite utiliser ces résultats afin d'effectuer un calcul sur un second maillage, un transfert de données est inévitable, notamment dans les études chaînées, lors de processus d'adaptation de maillage ou encore pour des couplage entre codes. La simulation numérique doit tenir compte de cet état de fait, ce qui n'est pas totalement le cas aujourd'hui; la division R&D d'EDF souhaite donc disposer d'outils permettant de lever ce verrou au sein du logiciel libre Code_Aster.Le manuscrit présente une synthèse des travaux menés durant la thèse, qui répondent aux objectifs suivants: proposer des méthodes de transfert de champs, comparer et qualifier ces différentes approches à l'aide d'ananlyses d'erreur théoriques et numériques, implanter l'une de ces méthodes dans Code_Aster, valider cette programmation sur quelques cas industriels.In continuum mechanics, when a problem is solved with the finite element method, field are known on nodes or on integration points, on a given mesh of the structure. If we which to use these results to perform a calculation on a second mesh, a data transfer is inevitable, especially in studies which imply adapting mesh process, or for coupling several codes. Numerical simulation must take this fact into account, which is not entirely the case today. So R&D division of EDF is eager to use some tools to remove this lock, in the software Code_Aster.There is a sum up of the work dine during the thesis. The objectives are the following: propose some methods for fields transfers, compare and describe these different approaches with theoretical analysis and numerical errors, implement one of these methods in Code_Aster, validate this implementation on some industrial cases

    Monolingual Adapters for Zero-Shot Neural Machine Translation

    Get PDF
    International audienceWe propose a novel adapter layer formalism for adapting multilingual models. They are more parameter-efficient than existing adapter layers while obtaining as good or better performance. The layers are specific to one language (as opposed to bilingual adapters) allowing to compose them and generalize to unseen language-pairs. In this zero-shot setting, they obtain a median improvement of +2.77 BLEU points over a strong 20-language multilingual Transformer baseline trained on TED talks

    ON THE EARTHQUAKE ACTIVITY IN THE DEEPER ZONE OF SAKURAZIMA

    Get PDF
    Since eraly houre of the 29th of May, 1968, a great many felt earthquakes have occurred.So, the precise seismometric observation using the data-recorder and the other instruments werecarried out.The results of the investigation on these earthquakes can be summarized as follows:I) Supposing that the underground structure is homogeneous and having the averragevelocity of 2 Km/sec. Or 3 Km/sec. For P wave, the epicenters of these earthquakes are estimated tobe distributed from the center to the east part of Sakurajima and to be 2-15 Km deep.2) The push pull distribution of P wave of these earthquakes does not prove any regularity.3) The coefficient of Isbimato-Lida's empirical formula, m, is 1.8 which is equivalent to oneof the deeper zone earthquakes in the volcano.4) Since the occurrence of the eatbquake swarm, the surface phenomena of the volcano havenot shown the conspicious change, and the shallow zone earthquakes near the crater have notoccurred so many too.Since eraly houre of the 29th of May, 1968, a great many felt earthquakes have occurred.So, the precise seismometric observation using the data-recorder and the other instruments werecarried out.The results of the investigation on these earthquakes can be summarized as follows:I) Supposing that the underground structure is homogeneous and having the averragevelocity of 2 Km/sec. Or 3 Km/sec. For P wave, the epicenters of these earthquakes are estimated tobe distributed from the center to the east part of Sakurajima and to be 2-15 Km deep.2) The push pull distribution of P wave of these earthquakes does not prove any regularity.3) The coefficient of Isbimato-Lida's empirical formula, m, is 1.8 which is equivalent to oneof the deeper zone earthquakes in the volcano.4) Since the occurrence of the eatbquake swarm, the surface phenomena of the volcano havenot shown the conspicious change, and the shallow zone earthquakes near the crater have notoccurred so many too
    corecore